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Abstract 

In a recent paper the author proved a theorem to the effect that the matrix of normalized Euclidean 
distances on the set of specially distributed random points in the n-dimensional Euclidean space R" with 
independent coordinates converges in probability as n —>■ oo to an ultrametric matrix, the latter being 
completely determined by the expectations of conditional variances of random coordinates of points. The 
main theorem of the present paper extends this result to the case of weakly correlated coordinates of 
random points. Prior to formulating and stating this result we give two illustrative examples describing 
particular algorithms of generation of such ultrametric spaces. 


1 Introduction 

One of the principal problems encountered in data processing is the problem of structuring of objects. As 
a rule, each object is represented by some feature vector. Besides, the data set is a mapping from the set 
of objects into the space of feature vectors for these objects. Feature-based structuring of objects can be 
performed, for example, using agglomerative algorithms of hierarchic clustering. As a result of such clustering 
of objects one obtains a quantitative hierarchic classification thereof, which is equivalent to the definition of 
some indexed hierarchic structure (or an ultrametric structure) on the set of objects [1]. Such an approach is 
conventional for problems of classification of objects in terms of the closeness of their feature vectors. It is of 
considerable interest to examine the data sets relating to a set of objects which are naturally equipped with 
an explicit or implicit ultrametric structure. If such an ultrametric structure is explicit, then this structure 
is directly felt in the data sets and may be easily identified by analyzing the metric matrix defined on the 
space of feature vectors of the objects. Nevertheless, if the ultrametric structure is implicit, then such an 
approach may fail to detect it. One possible approach to identify implicit ultrametric structures is based on 
the transition from the initial data set to a different data set (of possibly much smaller dimension), which 
corresponds to the choice of new effective variables representing the space of features of the objects. 

The principal problem in examining the systems with ultrametric structures is to identify the features of 
objects responsible for indexing of these structures. When applied to specific systems (data sets), such prob¬ 
lems are generally nontrivial and have no universal solution algorithms. Existence of ultrametric structures 
in data sets pertaining to systems of various natures was examined in a number of papers (see, for example, 

[H131S])- 

We recall the definition of an ultrametric space. A set M = {x} is called a metric space if, for any pair of 
elements, a;“, G M, there is a distance function between dab = d{x°‘,x^) (a metric) satisfying the following 
conditions for any x°‘, x^, x^ G M: 

1) dab ^ O 5 2 ) dab — 0 a — 6 , 3) dab — dba-} d) dab ^ dac ddc- (1) 

A metric dab satisfying the strong triangle inequality 

dab < max{dac, dbc} ( 2 ) 
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is called an ultrametric. A space with ultrametric is called a space with ultrametric structure or an ultra¬ 
metric space. Endowing some set of objects with an ultrametric structure is equivalent to specifying an 
indexed hierarchic structure |T]. An arbitrary real matrix d = {dab} is a metric matrix if its entries satisfy 
conditions ©; it is called an ultrametric matrix if its entries satisfy conditions © and ([1]). 

One area of considerable promise is that of the search of possible mechanisms explaining the appearance of 
ultrametric structures in systems of various natures. An efhcient approach for attacking such problems is the 
use of the methods of analysis on ultrametric spaces (or the ultrametric analysis). The ultrametric approach 
has been a useful tool for a long time for solving various problems in the area of classification of objects and 
information processing of data sets [1]. The ultrametric analysis has been considerably advanced during the 
last 30 years—it received a great impetus from the pioneering works of researches from the scientific school 
of Academician V. S. Vladimirov, whose efforts were later taken up by a number of research works from 
different scientific schools (for an overview, consult, for example, [3). This relatively new research field is 
now known as the “p-adic and ultrametric mathematical physics” and is blessed by a number of books and 
an immense number of research articles in the area of p-adic analysis, p-adic mathematical physics and their 
applications to modeling in various areas of physics, biology, computer science, sociology, physiology, etc. 
(see [S1I1I7] and the references given therein). 

In a number of studies dedicated to the application of methods of the ultrametric analysis to real systems 
it was noted several times (see, for example, [TJIllIHlinilin]) that the correlation coefficients of sparse data sets 
of large dimension have ultrametric properties. More exactly, it was shown that the matrices of normalized 
Euclidean distances between randomly distributed points in a multivariate space become close to ultrametric 
matrices as the dimension of the space increases. The probabilistic justification of this fact based on the 
laws of large numbers was put forward in muni for groups of random points with the same distribution 
in a multivariate space. In the recent paper El we formulated and proved a general theorem stating that 
the matrix of normalized Euclidean distances on the set of specially distributed random points in the n- 
dimensional Euclidean space R" with independent coordinates converges in probability as n —>■ c» to the 
ultrametric matrix. The entries of this ultrametric matrix were given explicitly, and moreover, were shown 
to be completely determined by the expectations of the conditional variances of the coordinates of random 
points. In the present paper we extend the results of m and formulate and prove an analogous theorem for 
the case of correlated coordinates of random points. 

The paper is organized as follows. In the next section we, following El, give an illustrative example de¬ 
scribing one particular algorithm for constructing an ultrametric space by generating independent randomly 
distributed points in the n-dimensional Euclidean space with independent coordinates. In Section 3 we pro¬ 
vide a different illustrative example of the construction of an ultrametric space by generating random points 
in the n-dimensional Euclidean space, in which the coordinates of random points are correlated in a special 
way. In Section 4 we formulate and prove the main theorem to the effect that the metric matrix of the set of 
independent random points with correlated coordinates with special distribution in the n-dimensional Eu¬ 
clidean space converges in probability, under certain conditions, as n —> c» to the ultrametric matrix. Here, 
the ultrametric matrix is given in an explicit form, which is determined by the expectations of conditional 
variances of random coordinates of points. 


2 Illustrative example I. Independent coordinates 

We shall consider the following algorithm for construction of a generation of independent randomly dis¬ 
tributed points in the n-dimensional Euclidean space with independent coordinates. This algorithm com¬ 
prises the following V-step procedure. 

Let pi, P 2 , ..., Pn be a sequence of natural numbers. At the first step we generate pi indepen¬ 
dent random points ..., (oi = 1,2,...,pi) in the n-dimensional space R" 

with normal distribution A/'(0 ,(Ti) for each independent coordinate. The coordinates of different points 
are also assumed to be independent. At the second step we generate piP 2 independent random points 
a;(“i“ 2 ) ..., (here, oi = 1, 2 ,... ,pi, 02 = 1, 2,... ,p 2 and 0102 is a two-dimensional 

index) in R" with normal distribution A/” 172 ^ for the fth coordinate. Next, at the third step we 

generate P 1 P 2 P 3 independent random points ..., (oi = 1 , 2 ,...,pi, 02 = 


2 


1, 2,... ,p 2 , 03 = 1, 2,... ,p 3 ) in K” with normal distribution Af 0 - 3 ^ for the rth coordinate, and so 

on. We shall repeat N times this procedure of generation of random points. At the last Nth step we gen¬ 
erate pip 2 ••’PAf independent random points (oi = 

1,2,... ,pi, 02 = 1,2,... ,p 2 , ..., Oat = 1,2,... ,pn) in M" with normal distribution J\f ctat^ 

for the ith coordinate. 

The set of points Mn^'^ = |a:(“i“ 2 " aAr) | forms a metric space with the metric 




(aia2'"aAr) (h 


lb2 -bjv)^ 






ib2---bjv)'^ ^ 


(3) 


The natural question here is how close is the metric matrix (|3]) to the ultrametric one for various n? Below 
we shall give the results of numerical simulations in which the number of steps of the procedure is TV = 3. 

We let (]V) denote the metric matrix that was numerically generated in accordance with 

the above procedure, where N is the number of steps in the procedure, pi, i = 1, 2,..., TV, is the number 
of points at the ith step of the procedure. Below we give the results of calculation of an arbitrary random 
realization of the metric matrix (3) with fixed values of the variance (Ti = (T 2 = <73 = 10 and when 

the dimensions n of the space M" are, respectively, n = 10 , n = 10 ^ and n = 10 ^. 
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It is seen that with increasing n the random realization of the matrix dn’’^’^'^ (3) becomes more and more 
close to the ultrametric matrix. Using the results of m one may show that as n —> 00 the random realization 
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For any finite metric space U one may define the so-called “degree of ultrametricity” or the “ultrametric 
measure” of the space U as some value, which quantitatively determines the closeness of the metric matrix of 
the space Li to the ultrametric matrix. Various approaches to the definition of the degree of ultrametricity 
were discussed in a series of papers [a [g HU na [El E]. In the present paper we shall use the definition 
proposed in M- We recall this definition. 

Let U he a, finite metric space with elements a;“, a = 1, 2,..., and let dab = d be a metric on U. 

For any three points (triangle) {x°',x^,x‘^^ in U we consider the function 


u(a, b,c) = 2 


mid {dab: dbc: dca} 
max {dab: dbc: dca{ 


- 1 , 


where mid {dab: dbc: dca)} is the length of the middle-length side of the triangle {a:“, The quantity 

u{a,b,c) will be called the degree of ultrametricity of the triangle {x“,x^,x'^}. The degree of ultrametricity 
of a metric space U is the number U, which is defined as the average of u{a, b, c) over all possible triangles 
inU that are distinct from {x“,x*”, x'^}: 


U = 


3!(iV-3)! 

m 


N N N 

^ u{a,b,c). 

a—1 b—a-\-l c—b-\-l 


E E 


We note that 0 < 17 < 1, where [/ = 1 if is an ultrametric space. 

Figure [T] shows the pointwise dependence of the degree of ultrametricity U for the random realization 
of the space of points generated by the above procedure versus the dimension n of the Euclidean space M". 
The following parameters were chosen: N = 3, pi,= p 2 = Ps = 2, ai = a 2 = cr^ = 10. In this graph, to 
each value of n (in the range between n = 4 and n = 10^) there corresponds one point associated with one 
arbitrary realization of the ultrametric matrix dn’^’^'^ (3). 


3 Illustrative example II. Dependent coordinates 

In this section we shall consider the algorithm of generation of a random points with dependent coordinates 
in the n-dimensional Euclidean space; this algorithm is a modification of the algorithm considered in the 
previous section. In the framework of this modification it is ensured that any two of n coordinates of each 
random point are nontrivially correlated. For our purposes it will be convenient to use the so-called hierarchic 
(cluster) correlation of coordinates of each random point, the construction thereof will be described later. 
In this construction the entire family of coordinates of each random point is partitioned into hierarchically 
nested groups, which will be called clusters. For such a partition each coordinate can be associated with 
an element of some finite set M (a finite ultrametric space). Under this approach we shall assume that the 
covariance of two coordinates is defined by the ultrametric distance between the elements of the space M that 
corresponds to these coordinates. It is worth pointing out that the ultrametric on the space M associated 
with the set of coordinates of the n-dimensional Euclidean space M" has no relation to the ultrametric that 
arises on the set of random points in R". The only reason for us to introduce an ultrametric on the space M 
is to provide for a nontrivial correlation between the coordinates of each random point of R". 
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We set n = m^, where m and k are positive integer numbers. The entire set of coordinates {xi, X 2 t ■ ■, Xn} 
will be called the zero level cluster. We partition the zero level cluster {xi,X 2 ^ ■ ■ ■ ,Xn} into m groups 

{Xl , X 2 , . . . , X^k — l } , —, X^fc —1 _j _2 ; . . ■ 5 X2ui^~^ } 5 ■ ■ • ) 

{^(m—+ —l)m^“^+25 ■ ■ • 7 X^k j- , 

which will be called the first level clusters and which will be indexed by an index = 1, 2,..., m. In turn, 
we partition each hrst level cluster into m subgroups, which will be called the second level clusters. For 
example, we partition the cluster {xi,X 2 ,... ,Xmfc-i} into the subclusters 


{xi, X2, . . . , X^fc-2 } , ^Xjyik-2_^i , X^fc-2_|_2, . . . , X2m^-^ } 5 • ■ • ) 

{^{m —+ ^(m—l)m^“^+2 5 • • ■ : } ■ 

Second level clusters will be indexed by a two-dimensional index ( 1112 ), *i ,*2 = 1,2,.. .m, where ii is the 
number of a first level cluster which contains the second level cluster, 12 is the number of a second level cluster 
inside the iith first level cluster. Next, we partition each second level cluster into m subgroups, which will 
be called third level clusters. Third level clusters will be indexed by a three-dimensional index ( 1112 * 3 ), 
* 1 , * 2 , *3 = 1, 2,... m. We continue this process of partition until we get the [k — l)th level clusters of which 
each contains m coordinates. Thus, under such a partition all the coordinates are united into hierarchically 
nested clusters. Moreover, each coordinate which lies in an *ith first level cluster, in an * 2 th second level 
cluster, ..., and in an *fe_ith {k — l)th level cluster can be indexed by a multiindex a = (*i,* 2 , ■ • ■ ,ik), 
*i,*2,-'-,*fc — 1,2,'-- xn. 

Let ^iii 2 ...ik 7 Cni 2 . .ifc-i) ■ • ■ ,Ciii 2 , fii) ^>6 families of independent random variables distributed according 
to the normal distribution law ^(0, cr). We define the random coordinates of a point x in K", n = m^, as 




+ x 


“V- ■ 






-f • • •-f A + A 


(4) 


where A is the control correlation parameter of coordinates of random points. It is easily seen that the 
expectation and variance of (|4|) are as follows 


fe-i 


E [xiii 2 ...i!^] — 0, Var 


._. 1 2 fc 

i=0 


-A 


-2 ■ 


One can also easily calculate the covariance of two random coordinates of (|T]) of the form Xi.ii 2 ...ir-iirir+i---ik 
and Xi^i 2 ...ir.-ijrjr+i...ok for various indexes irir+i ■ ■ - ik and jrjr+i ■ ■ -jk- 


k-l 


COV ^Xi^i2...ir-lir-ir+l---ik7 Xili2---ir-ljrjr+l---jk] ~ ^ ^ A ^ — CT A 

j=r 

We note that, for A > 1, 

„2 

Var[x,i,,,„,J 


-2r 


I _ 

1 - A-2 


ri—>oo 1 — A ^ ' 
COV [Xi,^i2...ir-ljr'ir-+l...jfc ) j ,, .j,, ] 


n—¥oo 1 — \ 


-2 ■ 


Let pi, p 2 , ..., Pn be a set of natural numbers. We generate pi independent random points yfoi) 
(oi = 1 , 2 ,... ,pi ) in the n-dimensional space M" so that the coordinates ik each point x^^^ are 

defined as where x'fj] are defined by ©. Next, we generate pip 2 independent random 

points (here, oi = 1, 2,... ,pi, 02 = 1, 2,... ,p 2 , and 0302 is a two-dimensional index) in R". Besides 

each random coordinate ^ defined as + Viiil-ik^ are dehned 

by Q. Next, we generate P 1 P 2 P 3 independent random points = 1,2,... ,pi, 02 = 1, 2,... ,p 2 , 

03 = 1 , 2 ,... ,p 3 ) in M", for which + ViX-^lk^ are defined as in 0 , and 

so on. We repeat N times this procedure of generation of random points. At the next step, we generate 
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PiP2 ■ ■ ■ Pn independent random points (oi = 1, 2 , ... ,pi, 02 = 1, 2 ,.. . ,p2, ..., = 1, 2 , ... ,pn) in M" for 

which are defined as in Q. The set of points 


U, 


(N) 


= |2;(“i“2 'aN)I forms a metric space with the metric 


^2,(aia2-- 


cin) j.{bib2---bN) \ = 


E 

2l,22,---,«fe = l 




(aiOa-'-ajv) _ (bib-. 


bN)'' 


(5) 


We next give the result of numerical simulations for the study of the dependence of the ultrametricity 
degree U of the metric space on the dimension n of the Euclidean space R" for various values of the 
parameter of correlation of coordinates A. Clearly, in the limit A —> c» we have the case of independent 
coordinates examined in the previous section, and hence we shall be concerned with the case when A is close 
to 1. We shall choose the following values n of the dimension: n = 2*, i = 2,..., 11. Next, for each such n 
we generate 10 realizations of the 8x8 metric matrix dn , ai,bi = 1,2 with the following 

fixed values of the parameters: m = 2, N = 3, pi = p 2 = Ps = 2, a = 10 and with A = 0.8, A = 1.2, A = 2 and 
A = 10, respectively. Next, for each realization we calculate the degree of ultrametricity U, and then calculate 
the average U of the degree of ultrametricity U over all 10 realizations. Figure [2] depicts the pointwise graphs 
of U (n) for various A. These calculations show that, for weakly correlated coordinates of random points 
(A > 1), the degree of ultrametricity U of random realizations of the metric matrix dn 
tends in probability to 1 with increasing n (see the points corresponding to A = 1.2, A = 2, and A = 10 
in Fig. ID). Nevertheless, for strong correlations of coordinates of random points (A < 1), the degree of 
ultrametricity in the random realizations of the metric matrix is not increasing with is increasing n (see 
the points corresponding to A = 0.8 in Fig. [2|). In the next section we shall formulate and prove the main 
theorem, which gives conditions on the correlation functions of coordinates of random points ensuring that 
the metric matrix of these points converges in probability to the ultrametric matrix in the limit n ^ 00 . 


4 The main theorem 


First we need some results from probability that will be used to prove the main theorem. 

Let {11, E, P} be a probability space, where (fl, E} is measurable space, P is probability measure. A real 
random variable X is a measurable mapping X : fl —>■ i?. For any real random variable X = X{uj) one may 
define the integral f^X(uj)dP(aj), A G E. The expectation and the variance of X are defined, respectively, 
as E [X] = f^X(oj)dP(uj) and V [X] = E [X^] — (E[X])^. Let E^^^ C E be a cr-subalgebra of E, then 
the conditional expectation E [X of a real random variable X is a random variable V such that V is 

E(i) measurable and, for all A G E^^^, f^X(oj)dP(uj) = X4 T ( w)dP(a;), and the conditional variance 
Var [X|e(i)] is defined as Var [XjE^^)] = E [X^ |E(i)] - (E [X |E(l)])^ 


Theorem 1. (Markov’s theorem, [15]) Let Xi,X2 ,... be a sequence 
finite expectations E [X^j = rui, let - * 


Then — j g_^ Jqj- any e > 0 one has P 

n n 


->■ 0 as n - 
Sn jSn) 
n 


n 


> e 


probability). 


of dependent random variables with 
and let Sn — Ali -t- X2 T ... T Xn- 
—>■ 0 as n —>■ c» (the convergence in 


Theorem 2. (Slutsky’s theorem, [111 [T7|) If Xn^ A X^^), X^^ ^ X^^^, ..., X^^^ A X^^\ and 
h is a continuous function of N variables, then 



The main theorem is as follows. 


6 









Theorem 3. Let {17, S, P} be a probability space, be an increasing sequence of a-subalgebras C 
S(2) c • • • C EW = E. Let M„ = (ai = 1,2,... ,pi, 02 = 1,2,... ,p 2 , ..., on = 1,2,... ,pM, 

and aia 2 ■ ■ ■ On is N-dimensional index) be sets of pip 2 ■ ■ ■ Pn independent random points a;(“i “2 'aAr) _ 
/ (aia 2 aN)^ ^ ^ ^ ^ ^(aia^ aN)\ ]^n yjUJi ggjig^j-QUy coordinates. Assume that 


the conditional expectations E E^^^ = {fc = 1,2,..., iV — 1^ are identical for all 


, m = 1,2, 3,4 and cov 


{aia2---aN) (aia2-"aiv) 


Uk+i, ak+ 2 , ■■■, aN- Next, assume that E , m = 1,2,3,4 and cov 

are uniformly bounded, E = im, Var = af, E Var (k = 

1,..., N — 1) and the conditions 

1 TT- ■ ^ ^ * 

llim ^ E ^ = 0, h,l 2 = 1,2 

i,j=l L 2 

i<j 

are hold. Then the metric on 


1 " 

n / ^(oia2'"ajv) Abib2---bN)\ _ ^ / ( 


aia2-"a.A'’) 


( 6 l 62 --- 6 Ar) V 

h ) 


has the property 


as n ^ CO, where 


dn ^(bib 2 ---b!f)'j 


^aia2---OAr,^l^2‘"^Ar 


2ia2---ajv,^i^2'"^7V v^((l- ^aibi '^0262 ■ • ^aAT&Ar) 0 ’^ + 


4“ (1 daibida2b2 ' ' ' <^ajv-l&Ar-i) O'Af—1 4“ ’ ’ ' 4“ (1 Saibi ) O'! ) ' 
is an ultrametric P 1 P 2 ■ ■ ■ Pn x P 1 P 2 ■ ■ ■ PN~‘fnatrix. 

Proof. First we prove by induction that 


2 l 02 'aiv) _ ^(blb 2 " &Ar)\ _ 


— 2 ((1 <^0262 ■ ■ ■)'^Af 4“ 

4- (1 — Saibi5a2b2 ■ ■ ■ ^aN-ibN-l) ‘^N-1 4" ’ ’ ’ 4" (1 — (5ai6i) CrJ) . (6) 

For the case TV = 2 we have 

E = 

= il-da,bja2b2)^ E = 

= il-Sa,b^Sa2b2)^ E + 

+ E E(^) -2E E(i) E E(^) = 


= (1 - 5a,b,5a2b2) E [Var E^^) ] + Var [( 


e(i) + 


+ (l-<5ai6i5a2b2)(l-^aibj(E[(xf^“^^) E« ] -E 



= 

= (l-< 5 ai 6 i^a 262 )(E[Var[(x^“=^) E«]] +E[var 



4- 
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-(1) 


+ Var 


{bib2) 


+ (1 - (5aihi) [yar 


5](i) 


+ 


= 2 ((1 - Sa,bja2b2) Crl + (1 - ^aihi) CT^) • 

We suppose that ([S]) is true ioi N = k and then will show that ([S]) is true for iV = fc + 1 

2" 


f^(“i“2'"£ifc+l) _ ^{bib2---bk+i)' 


(l <iaibi<ia2&2 ' ■ ■ ^afc+lbfc+l) ^ 
= (l ~ <iaibi^a2&2 ■ ■ ■ ^afc+ibfc+i) E 

+ Var 

+ ( 1 - ^aibi^a2b2 


E 

Var 


(aia2---afc+i) 


(ai02---afc+i) 


^(k) 


+ 


X] 


(bib2'"bfc+i) 






(aia2---ak) (bib2-■ ■bk)'' 


— 2 (l ^aibi<5a2b2 ’ ’ ’ ^a^+ib^+i) 0’/c+l + 


+ (1 <iaibi^a2b2 ’ ’ ’ ^a^bk) E 




(oia2'"afc) _ (bib2'"bfc)' 


— 2 ^aibi<io2b2 ■ ■ ■ ^Ofc+ibfe+i) <7^+1 + 

+ (1 ~ ^aibi<io2b2 ■ ■ ■ <^Ofebj.) ^Tfe + ’ ’ ‘ + (1 ~ <^oibi) ) . 


This proves equation ([S|). 
Next let us consider 


Var 


SC 

^Var (4 


^(aia2---aAr) _ ^(bi62■ ■-^Tv) 


{aia2---aN) _ (bib2-"bAr) 

'I'’i 


2=1 


n r 

+2 yy cov 


- X} 


i,j=i 

i<j 




{aia2---aN) (bib2-"biv)\ / (aia2-"aiv) (bib2-"b;v) 




Let us denote ^ ^ (bib 2 ---bN) ^ Then 


n r 

^Var (a 


(aia2'"aN-) _ (bib2'"biv) 

-j' 


2=1 


2I ^ 


2 = 1 


= ^ (M [xf] + M [yf] - 4M [xf] M [yi] - 4M [yf] M [xi]) + 
2=1 


+ 1: (4M [xf] M [s,?] - (M [x|])= - (M [„>])=) + 

2 = 1 
n 

+X! ^ [^*0 ’ 

2=1 


H cov 

i,j=i 

i<j 


/"„(“i“2 'Ow) _ (bib2'"b;v)'\^ ( (aia2 -aN) 

) ’ Wa 


^(bib2'-'b«-)^ ^ 
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= ^ cov {Xi - Vif , {Xj - yjY 


= (cov [x^x^j] + cov [yY J/J] - 2M [xi] cov [y|, y,] - 2M [a:^] cov [yf,yj ]) + 

*,j=i 

i<j 

n 

-2 Y (M [Vi] cov [a:^, Xi] + M [yj] cov [a;-, a;^]) + 

n 

+4 Y ('=ov [xi, ccj] cov [yi, yj] + cov [ 3 :^, 3 :^] M [y*] M [y^] + cov [y^, y^] M [xi] M [ 3 :^]). 

i ,3 = 1 

i<j 

Since M [ 3 ;^], M \xf \, M [a:f] , M [a;^], cov [aii, Xj\ are uniformly bounded and since 

^ n 1 ^ 1 ^ 

^ ^ cov [xlx‘^^] ^0, ^Y ^ 0, ^ ^ cov [a:„x,] ^ 0 


*.i=i 

i<j 


i,j=i 

i<j 




as n 00 we get 


rVar 


n , 

^a.(“i“2 '0«') _ ^{bib2---bN)'^‘ 


ij=i 

i<3 


0 . 


Then, by Markov’s theorem, 


E”., (4 


(oia2'"a«') _ (bifc2'"biv) ' 


/ ^ ( 2 
^ 2 ^(1 (Jai6i<5a2b2 ' ‘ ‘ 

/ 1 

“I" (1 ^aibi ^a2b2 * ' ' ^aj\/-ibiv-i} ^N—1 “1” * * * H“ (1 ^aibi ) ) ■ 

Next, by Slutsky’s theorem, 


d„ (xf - x^^ 


^ ^ ^ ((1 — Saibida2b2 • ■ ■ 


T (l (5a2&2 ■ ■ ' ^aK-ibN-l) ^N—1 T ’ ’ ’ T (1 ^aibi ) ^l) ^ ■ 
This completes the proof of the main theorem. 


□ 


5 Conclusions 

The present paper is an extension of the author’s paper |11| . in which a procedure for constructing finite 
ultrametric spaces was proposed based on the generation of a finite number of independent random points 
with independent coordinates in R”. It was shown that, for a special class of laws of distributions of points 
in M'*, the normalized matrix of Euclidean distances on the set of points converges in probability as n —> oo to 
the ultrametric matrix. In the present paper, we extend the result of m to the case when the coordinates 
of random points are statistically dependent. Our main result is Theorem [3] of Section 4, which states 
that, under a number of conditions on the expectations of the variance and the covariance of coordinates 
of random points, the matrix of Euclidean distances of a random realization of a finite number of points 
tends in probability as n —>■ oo to the ultrametric matrix. The proof of Theorem[3]depends, in particular, on 
the law of large numbers in the form of Markov’s theorem. We obtain the explicit form of this ultrametric 
matrix and show that it is completely determined by the expectations of the conditional variances of the 
coordinates of points. The paper also contains two illustrative examples obtained by computer simulation 
of random points in Euclidean spaces of large dimensions. These examples illustrate the working principle 
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of the theorem for specific distribution of random points in two cases: when the coordinates of points are 
dependent and when they are independent. 

An interesting questions is how the mechanism of generation of the ultrametric considered above may 
manifest itself in processing of data sets pertaining to real systems. In the present paper (as in the previous 
one) we do not pose the problem of describing the real systems in which the above scenario of origination 
of ultrametric structures admits an exact realization. Nevertheless, we may adduce some simple but general 
arguments supporting the idea that the realization of the proposed scenario may take place in feature sets of 
real objects. Assume that we are given some set of homogeneous objects. Assume that each object is assigned 
a feature set which can be described by a point in a multivariate coordinate space R", where n is sufficiently 
large. We also assume that the feature sets are, in general, statistically independent random variables 
and that a specific distribution law of a feature vector of each object depends on several external factors, 
which, in turn, depend upon a certain set of random parameters. We pose a problem of classifying some 
sample of such object, whose solution is based upon comparing normalized Euclidean distances between 
the objects. We suppose that a specific realization of a sample of objects is such that all objects from 
a sample can be subdivided into subfamilies, which satisfy the following condition: for each subfamily of 
objects all random parameters corresponding to external random factors have the same realization, whereas 
for distinct subfamilies of objects the random parameters corresponding to external random factors have 
different realization. It is easily seen that under the above assumptions the principal conditions of Theorem|3] 
should be satisfied. Moreover, if the hypotheses on the expectations, variances and covariances of coordinates 
(features of objects) of Theorem [3] are also satisfied, then one may expect with large probability that the 
metric matrix of normalized Euclidean distances on the space of sample object features is close to the 
ultrametric one. In this case, this will result in a clusterization of objects from different subfamilies of the 
sample in terms of their proximity with respect to the Euclidean metric. 

It is especially noteworthy that in constructing the metric matrix on the set of random points in M" we 
actually used the Euclidean metric in R”. Nevertheless, such a choice of a metric is not unique. Eor example, 
the distance between random points in R" can be measured with respect to the normalized Minkowski metric 
d{x, y) = (i \^i ~ > which is the Hamming metric with a = 1, the Euclidean metric with a = 2, 

and the Chebyshev metric with a —> oo. It is becomes an interesting question to determine the conditions 
for various a imposed on the distribution of points in R" under which the metric matrix of their random 
realization will tend in probability to the ultrametric matrix as n —>■ oo. We hope to examine this question 
in the nearest future. 
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Dimension, n 


Figure 2: The degree of ultrametricity U of the metric space Un^\ as averaged over 10 realizations, versus 
the dimension n of the Euclidean space K." with the parameters of generation of random points N = 3, 
Pi, = P2 = P3 = 2, (Ti = (72 = (Ta = 10 and when the correlation parameter of coordinates A equals to 0.8, 
1 .2, 2 and 10. 
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